在计算和数据方面,大型语言模型的预培训通常需要大量资源。经常使用的Web源(例如Common Crawl)可能包含足够的噪声,以使这种预训练的亚地区。在这项工作中,我们尝试了西班牙语版本的MC4的不同采样方法,并提出了一种新颖的以数据为中心的技术,我们将其命名为$ \ textit {Perplexity sampling} $,该技术可实现大约一半的语言模型的预培训步骤并使用五分之一的数据。最终的模型与当前的最新机构相当,甚至可以为某些任务获得更好的结果。我们的工作证明了变形金刚的多功能性,并为小型团队以有限的预算培训模型铺平了道路。我们的型号可在此$ \ href {https://huggingface.co/bertin-project} {url} $中获得。
translated by 谷歌翻译
Uncertainty quantification is crucial to inverse problems, as it could provide decision-makers with valuable information about the inversion results. For example, seismic inversion is a notoriously ill-posed inverse problem due to the band-limited and noisy nature of seismic data. It is therefore of paramount importance to quantify the uncertainties associated to the inversion process to ease the subsequent interpretation and decision making processes. Within this framework of reference, sampling from a target posterior provides a fundamental approach to quantifying the uncertainty in seismic inversion. However, selecting appropriate prior information in a probabilistic inversion is crucial, yet non-trivial, as it influences the ability of a sampling-based inference in providing geological realism in the posterior samples. To overcome such limitations, we present a regularized variational inference framework that performs posterior inference by implicitly regularizing the Kullback-Leibler divergence loss with a CNN-based denoiser by means of the Plug-and-Play methods. We call this new algorithm Plug-and-Play Stein Variational Gradient Descent (PnP-SVGD) and demonstrate its ability in producing high-resolution, trustworthy samples representative of the subsurface structures, which we argue could be used for post-inference tasks such as reservoir modelling and history matching. To validate the proposed method, numerical tests are performed on both synthetic and field post-stack seismic data.
translated by 谷歌翻译
Social insects such as ants communicate via pheromones which allows them to coordinate their activity and solve complex tasks as a swarm, e.g. foraging for food. This behaviour was shaped through evolutionary processes. In computational models, self-coordination in swarms has been implemented using probabilistic or action rules to shape the decision of each agent and the collective behaviour. However, manual tuned decision rules may limit the behaviour of the swarm. In this work we investigate the emergence of self-coordination and communication in evolved swarms without defining any rule. We evolve a swarm of agents representing an ant colony. We use a genetic algorithm to optimize a spiking neural network (SNN) which serves as an artificial brain to control the behaviour of each agent. The goal of the colony is to find optimal ways to forage for food in the shortest amount of time. In the evolutionary phase, the ants are able to learn to collaborate by depositing pheromone near food piles and near the nest to guide its cohorts. The pheromone usage is not encoded into the network; instead, this behaviour is established through the optimization procedure. We observe that pheromone-based communication enables the ants to perform better in comparison to colonies where communication did not emerge. We assess the foraging performance by comparing the SNN based model to a rule based system. Our results show that the SNN based model can complete the foraging task more efficiently in a shorter time. Our approach illustrates that even in the absence of pre-defined rules, self coordination via pheromone emerges as a result of the network optimization. This work serves as a proof of concept for the possibility of creating complex applications utilizing SNNs as underlying architectures for multi-agent interactions where communication and self-coordination is desired.
translated by 谷歌翻译
Self-supervised learning (SSL) learns useful representations from unlabelled data by training networks to be invariant to pairs of augmented versions of the same input. Non-contrastive methods avoid collapse either by directly regularizing the covariance matrix of network outputs or through asymmetric loss architectures, two seemingly unrelated approaches. Here, by building on DirectPred, we lay out a theoretical framework that reconciles these two views. We derive analytical expressions for the representational learning dynamics in linear networks. By expressing them in the eigenspace of the embedding covariance matrix, where the solutions decouple, we reveal the mechanism and conditions that provide implicit variance regularization. These insights allow us to formulate a new isotropic loss function that equalizes eigenvalue contribution and renders learning more robust. Finally, we show empirically that our findings translate to nonlinear networks trained on CIFAR-10 and STL-10.
translated by 谷歌翻译
Knowledge graphs, modeling multi-relational data, improve numerous applications such as question answering or graph logical reasoning. Many graph neural networks for such data emerged recently, often outperforming shallow architectures. However, the design of such multi-relational graph neural networks is ad-hoc, driven mainly by intuition and empirical insights. Up to now, their expressivity, their relation to each other, and their (practical) learning performance is poorly understood. Here, we initiate the study of deriving a more principled understanding of multi-relational graph neural networks. Namely, we investigate the limitations in the expressive power of the well-known Relational GCN and Compositional GCN architectures and shed some light on their practical learning performance. By aligning both architectures with a suitable version of the Weisfeiler-Leman test, we establish under which conditions both models have the same expressive power in distinguishing non-isomorphic (multi-relational) graphs or vertices with different structural roles. Further, by leveraging recent progress in designing expressive graph neural networks, we introduce the $k$-RN architecture that provably overcomes the expressiveness limitations of the above two architectures. Empirically, we confirm our theoretical findings in a vertex classification setting over small and large multi-relational graphs.
translated by 谷歌翻译
Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. Using the discretized dataset, a smaller, therefore more interpretable Decision Tree can be trained, which, in addition, enhances the stability and robustness of the baseline Decision Tree. Numerical results on real-world datasets show the effectiveness of the approach in terms of accuracy and sparsity compared to the baseline Decision Tree.
translated by 谷歌翻译
Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses the text content and the author names in the bibliography to atttribute an anonymous manuscript to an author. To train and evaluate our method, we created the largest authorship-identification dataset to date. It leverages all research papers publicly available on arXiv amounting to over 2 million manuscripts. In arXiv-subsets with up to 2,000 different authors, our method achieves an unprecedented authorship attribution accuracy, where up to 95% of papers are attributed correctly. Thanks to our method, we are not only able to predict the author of an anonymous work but we also identify weaknesses of the double-blind review process by finding the key aspects that make a paper attributable. We believe that this work gives precious insights into how a submission can remain anonymous in order to support an unbiased double-blind review process.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
In a spoofing attack, an attacker impersonates a legitimate user to access or tamper with data intended for or produced by the legitimate user. In wireless communication systems, these attacks may be detected by relying on features of the channel and transmitter radios. In this context, a popular approach is to exploit the dependence of the received signal strength (RSS) at multiple receivers or access points with respect to the spatial location of the transmitter. Existing schemes rely on long-term estimates, which makes it difficult to distinguish spoofing from movement of a legitimate user. This limitation is here addressed by means of a deep neural network that implicitly learns the distribution of pairs of short-term RSS vector estimates. The adopted network architecture imposes the invariance to permutations of the input (commutativity) that the decision problem exhibits. The merits of the proposed algorithm are corroborated on a data set that we collected.
translated by 谷歌翻译
Given a particular embodiment, we propose a novel method (C3PO) that learns policies able to achieve any arbitrary position and pose. Such a policy would allow for easier control, and would be re-useable as a key building block for downstream tasks. The method is two-fold: First, we introduce a novel exploration algorithm that optimizes for uniform coverage, is able to discover a set of achievable states, and investigates its abilities in attaining both high coverage, and hard-to-discover states; Second, we leverage this set of achievable states as training data for a universal goal-achievement policy, a goal-based SAC variant. We demonstrate the trained policy's performance in achieving a large number of novel states. Finally, we showcase the influence of massive unsupervised training of a goal-achievement policy with state-of-the-art pose-based control of the Hopper, Walker, Halfcheetah, Humanoid and Ant embodiments.
translated by 谷歌翻译